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METHOD AND SYSTEM FOR SUBJECT VIDEO STREAMING 

CROSS REFERENCE TO RELATED APPLICATIONS, 
[oooi] This application claims the benefit of U.S. 
Provisional Application No. 60/191,721 filed March 24, 
s 2000, the disclosure of which is herein incorporated by 
reference in its entirety. 

[ooo2j This application is related to U.S. Provisional 
Application No. 60/191,754, filed March 24, 2000 by Ping 
Liu, which will herein be referred to as the related 
io application. 

BACKGROUND OF THE INVENTION. 
FIELD OF THE INVENTION. 

[coos] The invention relates in general to the field 
of interactive video communication, and more particularly 
is to networked multi-viewpoint video streaming. This 
technology can be used for such interactive video 
applications as E-commerce, electronic catalog, digital 
museum, interactive education, entertainment and sports, 
and the like. 
20 DESCRIPTION OF RELATED ART. 

[ooo4j Since the invention of television, a typical 
video system has consisted of a video source (a live 
video camera or a recording apparatus), a display 
terminal, and a delivery means (optional if it is a local 
25 application) comprising a transmitter a channel and a 
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receiver. We call this type of video technology the 
objective video, in the sense that the sequential content 
. of the video clip is solely determined by what the camera 
is shooting at, and that the viewer at the display 
* 5 terminal has no control of the sequential order and the 
content of the video. 

[0005] A typical characteristic of most objective 
videos is that the visual content is prepared from a 
single viewpoint . In recent years there have been many 
.10 new approaches to producing multi-viewpoint videos. A 
multi -viewpoint video clip simultaneously captures a 
scene during a period of time, being it still or in 
motion, from multiple viewpoints. The result of this 
multi -viewpoint capturing is a bundle of correlated 

is objective video threads. One example of such an 

apparatus - is an Integrated Digital Dome { IDD) as 
described in the related application. 

[0006] With multi-viewpoint video content, it is 
possible for a viewer to switch among different viewpoint 

20 and so to watch the event in the scene from different 
angles. Imagine a display terminal that is connected to 
a bundle of multi -viewpoint objective video threads. 
Imagine further that the content of this multi -viewpoint 
bundle is about a still scene in which there is no object 

25 motion, camera motion, nor changes in luminance 
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condition. In other words every objective video thread 
in the multi-viewpoint bundle contains a still image. in 
this case, a viewer can still produce a motion video on 
the display terminal by switching among different images 
s from the bundle. This is a video sequence not produced 
by the content itself but by the viewer. The temporal 
order of each frame's occurrence in the video sequence 
and the duration for each frame to stay on the display 
screen are solely determined by the viewer at his/her 
10 will. we call this type of video the subjective video. 
In general, subjective video refers to those sequences of 
pictures where changes in subsequent frames are cause not 
by objective changes of the scene but by changes of 
camera parameters. A more general situation is the mixed 
is objective and subjective video, which we call ISOVideo 
(integrated subjective and objective video) . 

[ooot] A main difference between objective video and 
subjective video is that the content of an objective 
video sequence, once it is captured, is completely 
20 determined, whereas the content of a subjective video is 
determined by both the capturing process and by the 
viewing process. The content of a subjective video when 
it is captured and encoded is referred to as the still 
content of the subjective video, ; or the still subjective 
25 video. The content of a subjective video when it is 
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being played at viewer's will is referred to as the 
dynamic content of the subjective video, or the dynamic 
( subjective video. 

[ooos] The benefit of subjective video is that the end 
* 5 user plays an active role. He/she has the full control 
on how the content is viewed, through playing with 
parameters such as viewpoint and focus. This is 

especially useful when the user wants to fully inspect an 
interested object, like in the process of product 

10 visualization in E-commerce. 

[0009] With such apparatuses as IDD, the still content 
of subjective video can be effectively produced. There 
are two general modes to view the subjective video: local 
mode and remote mode. In the local mode, the encoded 

is still content of subjective video is stored with certain 
randomly" accessible mass storage, say a CD-ROM. Then, 
upon request, a decoder is used to decode the still 
content into an uncompressed form. Finally, an 

interactive user- interface is needed that displays the 

20 content and allows the viewer to produce the dynamic 
subjective video. In this mode, one copy of still 
subjective video is dedicated to serve one viewer. 

[ooio] In the remote mode, the encoded still content 
of subjective video is stored with a server system such 

25 as a fast computer system. Upon request, this server 
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system deliver, the .till elective video to a plurality 
of remote display terminals via an interconnection 
network such as an „ network. If the play process 
starts after the still content is completely downloaded, 
■ then the rest o, the process is exactly the „e as in 
the case of local mode. when the still content file size 
is too large to be transmitted via low-bandwidth 
connections in a tolerable amount of time, the download- 
and-play is not a practical solution. I£ the play 
» Process is partiaUy overlapped in time with the 
transmission, so that the play process may start with a 
tolerable time lag after the download starts, we are 
dealing with a subjective video streaming which is the 
topic of this invention. In the remote mode (or 
» specifically the streaming mode), one copy of still 
subjective video on the server serves a multiplicity of 
remote users, and one copy of still subjective video may 
yield many different and concurrent dynamic subjective 
video sequences . 

foouj It can be seen that the streaming mode shares 
many functional modules with the ^ ^ ^ ^ 
video decoding and display. Still , there are new 
challenges with the streaming mode, the main challenge ■ 
being that not all of the still contents are available 
locally before the streaming process completes. m this 
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case, not all of dynamic contents can be produced based 
on local still contents, and the display terminal has to 
■ send requests to the server for those still contents that 
are not available locally. The invention relates to a 
^ 5 systematic solution that provides a protocol for 
controlling this streaming process, a user- interface that 
allows the viewer to produce the dynamic content, and a 
player that displays the dynamic subjective video 
content . 

10 [0012] At present, there are mainly two types of video 

streaming technologies: single-viewpoint video streaming 
(or objective video streaming) and graphic streaming. 

Objective video streaming-. 
[0013] In single viewpoint video streaming (or 

is objective video streaming) , the content to be transmitted 
from server to client is a frame sequence made of single 
viewpoint video clips. These video clips are frame 
sequences pre -captured by camera recorder, or are 
computer generated. Typical examples of objective video 

20 streaming methods are real-time transport protocol (RTP) 
or real-time streaming protocol (RTSP) , which provide 
end-to-end delivery services for data with real-time 
characteristics, such as interactive audio and video. 
During the streaming process, the objective video is 

25 transferred from server to client frame by frame. 
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Certain frame can be skipped in order to maintain the . 
constant frame rate. The video play can start before the 
transmission finishes. 

[ooi4] A main difference between RTP/RTSP and the 
5 invented subjective video streaming lies in the content : 
RTP/rtsp only handles sequential vide v o frames taken from 
one viewpoint at one time, while subjective video 
streaming deals with pictures taken from a set of 
simultaneous cameras located in a 3D space. 

[ooi5] Another difference is that RTP/RTSP is 
objective, which means the client plays a passive role. 
The frame order, frame rate, and viewpoint of the camera 
are hard coded at recording time, and the client has no 
freedom to view the frames in an arbitrary order or from 
is an arbitrary viewing angle. m other words the server 
plays a dominating role. In subjective video, the end 
client has the control to choose viewpoint and displaying 
order. At recording time, multi -viewpoint pictures taken 
by the multi-cameras are stored on the server and the 
2 o system lets the end user control the streaming behaviors. 
The server plays a passive role. 

Graphic streaming. 
[ooiq Typical examples of graphic streaming are 

Metastream and Cult3D, two commercial software packages. 

2s in this approach there is a 3D graphics file pre-produced 
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and stored on the server for streaming over the Internet . 
The file contains, the 3D geometry shape and the textural 
\ description of an object. This 3D model can be created 
manually or semi -automatically. The streaming process in 

5 these two examples is not a true network streaming, since 
there is no streaming server 130 existent in the whole 
process. There is a client system which is usually a 
plug- in to an Internet browser and which downloads the 
graphics file and displays it while downloading is still 

io in progress. After the whole 3D model is downloaded, the 
user can freely interact with the picture by operations 
such as rotation, pan and zoom in/out. 

[0017] MetaStream, Cult3D, and the like deliver 3D 
picture of an object through a different approach from 

is the invented method: the former is model based whereas 
the later- is image based. For the model -based 

approaches, building the 3D model for a given object 
usually takes a lot of computation and man-hours, and 
does not always assure a solution. Also, for many items 

20 such as a teddy bear toy it is very hard or impossible to 
build a 3D model in a practical and efficient way. Even 
if a 3D model can be built, there is a significant visual 
and psychological gap for end viewers to accept the model 
as a faithful image of the original object. 
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SUMMARY OF THE INVENTION, 
[oois] in a preferred embodiment of the invention, 
there is no 3D model involved in the entire process. All 
the pictures constituting the still content of the 
s subjective video the are real images taken, from a 
multiplicity of cameras from different viewpoints. A 3D 
model is a high level presentation the building of which 
requires analysis of the 3D shape of the object. m 
contrast, in the above- identified preferred embodiment of 
10 the invention, a strictly image processing approach is 
followed. 

[oou] Given an object or scene, the file size of the 
pictorial description of it according to the invention is 
normally larger than in those model -based approaches, 
is However, the difference in size does not represent a 
serious challenge for most of the equipment for today's 
internet users. By means of the streaming technology 
according to the invention, the end user will not need to 
download the whole file in order to see the object. 
20 He/she is enabled to see the object from some viewpoints 
while the download for other viewpoints is taking place. 

[0020] Apple Computers produced a technology called 
QTVR (QuickTime Virtual Reality). This technology can 
deal with multi -viewpoint and panoramic images. There 
« are thus certain superficial similarities between the 
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QTVR and the invented method. QTVR supports both model - 
based and image-based approaches. Even so, there are 
many differences between QTVR and the invented method. 
QTVR and its third party tools require authoring work 
s such as stitching images taken from a mult i -viewpoint . 
Such operations typically cause nonlinear distortions 
around the boundaries of the patches. Operations 
according to the invention, however, do not involve any 
stitching together of images from different viewpoints. 

10 QTVR does not have a streaming server 130, and so the 
user needs to download the whole video in order to view 
the object from different aspect. In the invented 
method, the streaming server 13 0 and client together 
provide a system of bandwidth- smart controls (like wave- 

is front, scheduler, caching, etc.) that allow the client to 
play the subjective video while the download is still 
taking place . 

BRIEF DESCRIPTION OF DRAWINGS. 
[0021] Fig. 1 illustrates multi-viewpoint image 
20 capturing and coating. 

[0022] Fig. 2 shows a file format for a still 
subjective video content, that is, a file in the video at 
will format. 
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[0023] Fig. 3 shows the content Qf ^ offset table " 
produced during the content production process and stored 
in the video at will file header. 

[0024] Fig. 4 illustrates the basic steps involved in 
s subjective video streaming according to the invention. 

[0Q25] Fig. 5 is a state datagram to illustrate the 
lifecycle of a video at will session. 

[0026] Fig. 6 is a logic diagram showing the operation 
of the server in synchronous mode. 

(0027] Fig. 7 is a logic diagram showing the operation 
of the server in an asynchronous mode. 

I0O28] Fig. 8 shows the organization of the client 
system for subjective video streaming. 

[0029, Fig. 9 shows the construction of a viewpoint 

:^ map. 

[0030] Fig. 10 is a log±c diagram showing fche 

operation of the client. 

[003.] Fig. ii is a logic diagram show±ng fche 
operation of the scheduler. 

[0032] Figures 12(a) and (b) are explanatory figures 
for explaining a wave-front model and the accommodation 
of a user's new center of interest. 

10033] Fig. 13 illustrates exemplary fields in a video 
at will request . 
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[0034] Fig. 14 shows basic operations which may be 
available according to various embodiments of the 
■ invention while playing a subjective video. 

[0035] Fig. 15 is a logic diagram for illustrating the 
5 operation principle of an e-viewer controller. 

[0036] Fig. 16 is a diagram for explaining different 
revolution speeds . 

[0037] Fig. 17 is a diagram relating to the streaming 
of panoramic contents. 
io DETAILED DESCRIPTION OF THE PREFERRED . EMBODIMENTS . 

[0038} Fig. 1 illustrates the basic components of the 
invented subjective video streaming system 100 and its 
relation with the content production process. The 
content production procedure contains a multi-viewpoint 
is image capturing step and a coding (compression) step. 
These two * steps can be accomplished by means of an 
integrated device 180 such as the IDD described in the 
related application. The encoded data represents the 
still content of a subjective video and is stored on a 
20 mass storage 170 such as a disk that is further connected 
to the host computer 110 of the streaming system 100. 

[0039] The subjective video streaming system 100 
contains a streaming server 13 0 and a plurality of 
streaming clients 160 connected to the server 130 via an 
2s interconnection network, typically the Internet. The ■ 
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streaming server 130 is . software system that resides on 
a host computer no. It is attached tQ a „ eb ^ 
(e.g., Apache on Unix or lis on wlndows NT) The ^ 

-rver l 2 o decides when to call the streaming server «„ 
* to handle streaming-related request. via proper 

configurations such as MIME settings in the server 

environment . 

,<™, The streaming client 160 is a software module 
resident on the client machine 140 that can be a personal 
» computer or a Web TV set-top-box. xt can be configured 
to worx either independently or with mternet browsers 
such as Netscape or „. In tne latter ^ ^ ^ 
settings in Netscape or IB should be configured so that 
the browser Knows when the subjective video streaming 
is functions should be launched. 

[004,3 Lower level transmission protocols such as 
TCP/IP and UDP are reguired fco prov . de fche 

connection and data package delivery functions. HTTP 
protocol is used for the browser to establish connection 
» with the web server 120. Once the connection is set up, 
a streaming session is established and the subjective 
video streaming protocol takes over the control of .the 
streaming process. 
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VAW FILE. 

[0042] The subjective video streaming server 130 is 
connected with a mass storage device 170, usually a hard 
disk or laser disk. The still subjective video contents 
5 are stored on this storage device 170 in the unit of 
files. Fig. 2 shows the file format of a still 

subjective video content. For the rest of this paper 
this file format is referred to as VAW (Video At Will) 
file. In order to understand this file structure we need 

10 to review the construction principle of a capture and 
coding device 180, such as the IDD as described in the 
related application. A typical device 180 is a dome 
structure placed on a flat platform. On this dome 
hundreds of digital cameras are placed centripetally 

is following a certain mosaic structure, acquiring 
simultaneous pictures from multiple viewpoints. While 
coding (compressing) these mult i- viewpoint image data the 
device divides all viewpoints into processing groups 
(PGs) . In each PG. there is a generally central viewpoint 

20 (C-image) and a set of (usually up to six) surrounding 
viewpoints (S-images) . One IDD typically has 10-50 PGs. 

[0043] The output from such a capturing and coding 
device may be seen in Fig. 2. At the top level of 
syntax, a VAW file 200 contains a file header 210 

25 followed by the PG code streams 220. There is no 
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particular preference for the order of the PGs within the 
code stream. The file header 200 contains generic 
information such as image dimensions, and an offset table 
300 (see Fig. 3). A PG code stream 220 includes a PG 
s header 230 and a PG data body 240. The PG header 230 
specifies the type of PG (how many S- images it has) , the 
C-image ID, and coding parameters such as the color 
format being used, what kind of coding scheme is used for 
this pg, and so on. Note that different PGs on the same 
IDD may be coded using different schemes, e.g., one using 
DCT coding and another using sub-band coding. it will be 
understood that there is no regulation on how to assign 
the C-image ID. Each PG data body 240 contains a C-image 
code stream 250 followed by up to six S-image code 
streams 260. No restriction is required on the order of 
those S-image code streams, and any preferred embodiment 
can have its own convention. Optionally, each S-image 
may also have an ID number. 

[0044] Candidate coding schemes for compressing the C- 
20 image and S-images can be standard JPEG or proprietary 
techniques. If a progressive scheme is used, which is 
popular for sub-band image coding, the code stream of the 
C-image and/or S-images can further contain a base layer 
and a set of enhancement layers. The base layer contains 
25 information of the image at a coarse level, whereas the 
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enhancement layers contain information at finer levels of 
resolution. Progressive coding is particularly suitable 
■. for low bit-rate transmission. 

[0045] Fig. 3 shows the content of the offset table 
* s 300. This table is produced during the. content 

production process and is stored in the VAW file header 
210. It records the offset (in bytes) of the start of 
each PG code stream from the start of VAW file. It is 
important information for the server to fetch data from 
io the VAW file 200 during the streaming process. 
ORIGIN PG. 

[0046] For every VAW file 200 there is a unique PG, 
called the origin. Its central image corresponds to a 
particular viewpoint among all possible viewpoints. The 

is origin is the start point of a streaming process, and is 
client - independent . In other words, the origin provides 
the first image shown on a client's display for all 
clients who have asked for this VAW file. Different VAW 
files may have different origins, depending on the 

20 application. For on-line shopping applications, the 
origin could be the specific appearance of the product 
that the seller wants the buyer to see at the first 
glance. 
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PASSIVE STREAMING PRINCIPLE. 

[0047] Fig. 4 illustrates the basic steps involved in 
- the subjective video streaming. The basic idea is that 
the server 130 plays a passive role: whenever the client 
s 160 wants a picture, the server retrieves it from the VAW 
file 200 and sends it to the client. The server will not 
send any command or request to the client, except image 
data. The client plays a dominating role: it controls 
the pace of streaming and commands the server on what 
i. data are to be transmitted. This is different from the 
case of objective video streaming where the server 
usually has the domination. This- passive streaming 
principle helps dramatically simplifying the complexity 
of the server design, and therefore improve, 
is significantly the server capacity. 

[0048) A subjective video streaming process according 
to an embodiment of the invention may operate as follows. 
The client 160 initiates the streaming process by sending 
a request to the server 130 via HTTP. B y analyzing the 
20 request the server 130 determines which VAW file 200 the 
client 160 wants, and opens this VAW file 200 for 
streaming. The first batch of data sent from the server 
130 to the client 160 includes the session description 
and the image data of the origin PG. Once a VAW file 200 
» is open, an offset table 300 is read from the file header 
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210 and stays in the memory to help in locating a 
requested PG. Then the server 13 0 waits until the next 
. request comes. The client 160 keeps pushing the 

streaming by continuously submitting new GET requests for 
s other PG data. In this process a scheduler 820 (not 
shown in Fig. 4) helps the client determine which PG is 
most wanted for the next step. The client passes the 
received data to an E-Viewer 410 for decoding and 
display. Whenever the client 160 wants to terminate the 
io streaming, it sends an Exit request to the server and 
leaves the session. 
SERVER . 

[0049] In a passive streaming process, the only thing 
that the server 130 needs to do is to listen to the 
is incoming requests and prepare and put PG data to a 
communication buffer for delivery. The server 13 0 
manages these tasks through running a set of VAW 
sessions . 

{0050] Fig. 5 illustrates the life cycle of a VAW 
20 session. Associated with each VAW session there is a VAW 
file 200 and an offset table 300. They have the same 
life cycle as the VAW session. When the server 130 
receives the first request for a specific VAW file 200, 
it creates a new VAW Session, and opens the associated 
25 VAW file 200. From the header 210 of the VAW file 200 
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the offset table 300 is read into the memory. Multiple 
clients can share one VAW session. if a plurality of 
clients wants to access the same VAW file, then this VAW 
file is open only once when the first client comes, 
s Accordingly, the associated offset table 300 is read and 
stays in the memory once the VAW file 200 is open. For 
any subsequent requests, the server will first check if 
the wanted VAW file 200 is already open. if yes then the 
new client simply joins the existing session.. if not 
io then a new session is created. There is a timer 
associated with each session. its value is incremented 
by one after every predefined time interval. Whenever a 
new request to a session occurs no matter from which 
client, the server resets the associated time to zero. 
» When the timer value reaches certain predefined 
threshold, a time-out signal is established which reminds 
the server to close the session and releases the offset 
table. 

[oosi] Whenever a new client joins a VAW session, the 
20 first data pack it receives is a session description, 
including information such as type of the data capture 
dome, picture resolution information, etc. All these 
information are found from the header 210 of the VAW file 
200. The immediate next data pack contains the origin 
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PG. For transmission of the following data packs, there 
are two methods : synchronous mode and asynchronous mode . 

[0052] Fig. 6 shows the control logic of server in 
synchronous mode. The basic idea of this mode is that 
1 5 the client 160 has to wait until the PG data for the last 
GET command is completely received, then it issues a new 
GET request. In this mode, the server does not verify 
whether the data for the last request has safely arrived 
at the client's end before it transmits a new pack. 

io Therefore the workload of server is minor: it simply 
listens to the communication module for new requests and 
sends out the data upon request . 

[0053] Data streaming in the asynchronous mode is 
faster than in synchronous mode, with additional workload 

15 for server (Fig. 7) . In this mode, the client 160 will 
send a new request to the server 130 whenever a decision 
is made, and does not have to wait until the data for 
previous request (s) is completely received. To manage 
this operation the server sets up a streaming queue Q for 

20 each client, recording the PG tasks to be completed. For 
each new client, two control threads are created at the 
start of transmission. The streaming thread reads a PG 
ID at a time from the head of the queue and processes it, 
and the housekeeping thread listens to the incoming 

25 requests and updates the queue. In this mode, the 
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incoming request contains not only a PG ID but also a 
priority level. The housekeeping thread inserts the new 
request to Q so that all PG IDs in q are arranged 
according to the descending order of priority level. If 
s several PCs have the same priority level, a FIFO (first 
in first out) policy is assumed. 
CLIENT SYSTEM. 

[0054] Fig. 8 shows the organization of the client 
system 140 for subjective video streaming. since the 
io client system 140 plays a dominating role in passive 
streaming of still subjective video content, it has a 
more complicated organization than the server system 110. 
It includes a streaming client 160, an E-viewer 410, and 
a communication handler 150. The function of 

is communication handler 150 is to deal with data 
transmission. m an embodiment this function is 

undertaken by an Internet browser such as Netscape or 
Internet Explorer. Accordingly, the E-viewer 410 and the 
streaming client 160 are then realized as plug-ins to the 
20 chosen Internet browser. The task of the streaming 
client 160 is to submit data download requests to the 
server 130. The task of the E-viewer 410 is to decode 
the received image data and to provide a user interface 
for displaying the images and for the end user to play 
2s the subjective video. 
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[0055] The client system 140 is activated when the 
end-user issues (via an input device 880)' the first 
request for a specific VAW file 200. This first request 
is usually issued through the user interface provided by 
1 5 the Internet browser 150. Upon this request, the 
streaming client 160 and the E-Viewer 410 are launched 
and the E- Viewer 410 takes over the user interface 
function. 
VIEWPOINT MAP. 

ao 10056] In this client system 140, there is an 

important data structure, the viewpoint map 830, shared 
by the streaming client 160 and the E-Viewer 410. Fig. 9 
shows its construction. It has a table structure with 
four fields and is built by the streaming client 160 

:s after the session description is received. This session 
description contains the configuration information of the 
viewpoints, which enables the streaming client 160 to 
initialise the viewpoint map 830 by filling the PG-ID and 
the Neighboring PG fields for ail PGs . The Current 

20 Viewpoint field indicates whether any of the viewpoints 
. in a PG, including C-viewpoint or S-viewpoint, is the 
current viewpoint. At any time or moment there is 
exactly one PG that has YES in its current viewpoint 
field. Initially all PGs are NOT the current viewpoint. 

25 Once the origin PG is received, its current viewpoint 
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field is set to YES. The current PG is determined by the 
end-user, and is specified by the E-Viewer 410. 

[0057] In non-progressive transmission, the local 
availability field indicates whether a PG is already 
s completely downloaded from the server. m progressive 
transmission, this field indicates which base and/or 
enhancement layers of a PG have been downloaded. 
Initially the streaming client 160 marks all PGs as NO 
for this field. Once the data of a PG is completely 
io received, the E-Viewer 410 will turn the corresponding PG 
entry in the viewpoint map 830 as YES (or will register 
the downloaded base or enhancement layer to this field in 
the case of progressive transmission) . 
STREAMING CLIENT. 

[0058] Fig. 10 illustrates the control logic of the 
streaming client 160. When it starts operating, the 
first VAW file 200 request has been submitted to the 
server 130 by the Internet browser 150. Therefore, the 
first thing that the streaming client 160 needs to do is 
20 to receive and decode the session description. Then, 
based on the session description, the viewpoint- map 830 
can be initialized. The streaming client 160 then enters 
a control routine referred to herein as the scheduler 
820 . 
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SCHEDULER. 

[0059] To some extent, the scheduler 820 is the heart 
that drives the entire subjective video streaming system. 
This is because that any complete interaction cycle 
1 s between the server 130 and client 160 starts with a new 
request, and that except for the very first request on a 
specific VAW file 200, all subsequent requests are made 
by the scheduler 820. 

[0060] Fig. 11 shows the operation of the scheduler 

10 820. Once activated, the scheduler 820 keeps looking at 
the viewpoint map 830 to select a PG ID for download at 
the next step. If all PGs are found already downloaded, 
or the end user wants to quit from the session, the 
scheduler 820 terminates its work. Otherwise, The 

is scheduler 82 0 will select, from those non-local PGs, a PG 
that is believed to be most wanted by the end-user. 
There are different policies for the scheduler 820 to 
make such a prediction of the user's interest. In one 
embodiment a wave- front model is followed (see Fig. 12) . 

20 If the PG that covers the current viewpoint is not local, 
it is processed with top priority. 

[006i] In synchronous streaming mode, the client 
system 140 will wait for the completion of transmission 
of last data pack it requested before it submits a new 

25 request. In this case, when the scheduler 820 makes its 
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choice for the new PG ID, it waits for the 
acknowledgement from the E-Viewer controller 840 about 
the completion of transmission. Then a new request is 
submitted. In asynchronous mode, there is no such a time 
5 delay. -The scheduler 820 simply keeps submitting new 
requests. In practice, the submission^ process of new 
requests can riot be too ahead of download process. A 
ceiling value is set that limits the maximum length of Q 
queue on the server. in an embodiment this value is 
ie chosen to be eight. 
WAVE -FRONT MODEL. 

10062] Fig. 12 illustrates the principle of wave-front 
model. Maximum bandwidth utilization is an important 
concern in the subjective video streaming process. With 
:s limited bandwidth, the scheduling policy is designed to 
ensure that the most wanted PGs are downloaded with the 
highest priority. Since the "frame rate" and the frame 
order of a subjective video are not stationary and are 
changing at the viewer's will from time to time, the 
2c scheduler 820 will typically deal with the following two 
scenarios. 

[0063] Scenario One: the viewer stares at a specific 
viewpoint and does not change viewpoint for a while. 
Intuitively, without knowing the user's intention for the 
25 next move, the scheduler 820 can only assume that the 
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next intended move could be in all directions. This 
means that the PGs to be transmitted for the next batch 
■ are around the current PG, forming a circle with the 
current PG as the center. If all PG IDs on this circle 

5 are submitted, and the user still does not want to. change 
viewpoint, the scheduler 820 will process the PGs on a 
larger circle. This leads to the so-called wave-front 
model (Fig. 12 (a)). 

[0064] Scenario Two: a viewpoint change instruction is 

10 issued by E-Viewer 410. In this case, the shape of the 
wave front is changed to accommodate user's new center of 
interest (Fig. 12(b)). One can imagine that at the very 
initial stage of a streaming session, the shape of the 
wave front is a perfect circle with the origin PG as the 

is center. Once the user starts playing the subjective 
video, the wave front is gradually deformed into an 
arbitrary shape. 
REQUEST FORMAT. 

[0065] As shown in Fig. 13, a typical VAW request 1300 

20 should include but is not restricted to the following 
fields: 

• Session ID: tells the server to which VAW session this 
current request is made. 

• PG ID: tells the server where the new viewpoint is. 
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• PG Priority: tells the server the level of urgency 
this new PG is wanted. 

• PG Quality: if a progressive scheme is used, the PG 
quality factor specifies to which base or enhancement 
layer (s) the current request is made. 

PLAYING SUBJECTIVE VIDEO. 

too*. Fig. 14 shows three bas . c operat . ons wh . ch ^ 
be available while playing a subjective video: 
revolution, rotation, and zoom. Revolution is defined as 
- a sequence of viewpoint change operations. A rotation 
operation happens at the same viewpoint with X-Y 
coordinates rotating within the image plane. Zooms, 
including zoom-in and zoom-out, are scaling operations 
also acting on the same viewpoint. 

too 67] in an embodiment, the rotation is considered as 
an entirely local function, whereas the revolution and 
zoom require support from the server. The rotation is 
realized by a rotational geometric transform that brings 
the original image to the rotated image. This is a 
» standard mathematical operation and so its description is 
omitted for the sake of clarity. The zoom operations are 
realized by combining sub-band coding and interpolation 
techniques, which are also known to one familiar with 
this field. the zoom operafcions/ . f some Qf fche 

' enhancement layer data is not available locally, a 
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request is submitted for the same VAW session, same PG 
ID, but for more enhancement layers, and this request is 
to be dealt with by the server 13 0 with the highest 
priority. Revolution corresponds to a sequence of 
s viewpoint changes. Its treatment is described below. 
E- VIEWER. 

[0068] The functional components of the E- Viewer 
appear, in very simplified form, in Fig. 8. There are 
four major function modules: the E-Viewer controller 840, 

10 the geometric functions 850, the image decoder 860, and 
the end-user interface 870. The E-Viewer 410 is a 
central processor that commands and controls the 
operation of the other modules. The geometric functions 
850 provide necessary computations for rotation and 

is zooming operations. The image decoder 860 .reconstructs 
images from their compressed form. The end-user 

interface 870 provides display support and relays and 
interprets the end-user's operations during the playing 
of subjective video. 

20 [0069] There are three data structures that the E- 

Viewer 410 uses to implement its functions: the cache 
855, the display buffer 865, and the viewpoint map 830. 
The cache holds compressed image data downloaded from the 
server. Depending on the size of cache 855, it may hold 

2s the whole still subjective contents (in compressed form) 
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for a VAW session, or only part of it. More PG data that 
exceeds the capacity of cache 855 can be stored in a mass 
storage device 810 such as a disk. The display buffer 
865 holds reconstructed image data to be sent to display 
s 875. The viewpoint map 830 is used by both the E-Viewer 
controller 840 and the Scheduler 820. Whenever a data 
pack is received, the E-Viewer 410 updates the status of 
the Local Availability field for the corresponding PG in 
the viewpoint map 830. 
io [0070] The cache 855 plays an important role in the 

subjective video streaming process. After one picture is 
decoded and displayed, it will not be discarded just in 
case the end-user will revisit this viewpoint in the 
future. However, keeping all the pictures in the decoded 
is form in memory is expensive. The cache 855 will keep all 
the downloaded pictures in their compressed form in 
memory. Whenever a picture is revisited, the E-Viewer 
410 simply decodes it again and displays it. Note that 
we are assuming that the decoding process is fast, which 
20 is true for most modern systems. 

[007i] The decoding process is a process opposite to 
the encoding process that forms the VAW data. The data 
input to the decoder 860 may be either from the remote 
server 130 (via Internet) or from a local disk 810 file. 
25 However, the decoder 860 does not differentiate the 
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source of data, it simply decodes the compressed data 
into raw form* 
• E -VIEWER CONTROLLER. 

[0072] Fig. 15 illustrates the operation principle of 
s the E-Viewer controller 840. 

[0073] At the very beginning, the E- Viewer 410 is 
launched by the first request on a new VAW session 
through the Internet browser 150. The display 875 is 
initially disabled so that the display window will be 
10 blank. This is a period when the E-Viewer 410 waits for 
the first batch of data to come from the server 130. The 
E-Viewer 410 will prompt a message to inform the end-user 
that it is buffering data. In an embodiment, during this 
period, the origin PG and its surrounding PGs are 
is downloaded. 

[0074] During this initialization stage the E-Viewer 
410 controller will also clear the cache 855 and display 
buffer 865. Once the session description is received, 
the controller 840 will initialize the viewpoint map 830 
20 based on the received information. All the PGs will be 
marked non-local initially, and the current viewpoint 
pointer is at the origin viewpoint. (Given this 
information the scheduler 820 can start its job.) 

[0075] Once the first batch of data packs is received, 
2s the display will be enabled so that the end user will see 
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the picture of the origin viewpoint on the screen 875. 
Then the controller 840 enters a loop. m this loop, the 
controller 840 deals with the user input and updates the 
viewpoint map 830. In synchronous transmission mode, 
5 upon completion of a data pack, the controller will issue 
a synchronization signal to scheduler 820 so that the 
scheduler 820 can submit a new request. 

[0076] The E-Viewer 410 preferably provides four 
commands for the end user to use in playing the 
io subjective video: revolution, rotation, zoom, and stop. 
For each of these commands there is a processor to manage 
the work. In the revolution mode, the processor takes 
the new location of the wanted viewpoint specified by the 
user through an input device 880 such as a mouse. Then 
is it finds for this wanted viewpoint an actual viewpoint 
from the viewpoint map 830, and marks it as the new 
current viewpoint. In the rotation mode, the controller 
calls the geometric functions 850 and applies them to the 
image at the current viewpoint. The rotation operation 
20 can be combined with the revolution operation. 

[0077] If a stop command is received, the controller 
840 will release all data structures initially opened by 
it, kill all launched control tasks, and close the E- ■ 
Viewer display window. 


si. 
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SCALABLE TRANSMISSION. 

[0078] In order to support different applications with 
• different network bandwidth, the scheduler 820 and the E- 
Viewer controller 840 can be programmed to achieve the 

i 

5 following progressive transmission schemes to be used 
with the various embodiments. 
RESOLUTION SCALABILITY. 

[0079] As described above, when the still content of a 
subjective video is produced, the image information can 

10 be encoded and organized as one base layer 270 (see Fig. 
2) and several enhancement layers 280. If a user is 
using a fast Internet connection, he/she may ask for a 
session with a big image and more details. He/she would 
choose a smaller frame size if the Internet access is via 

is a slow dialup. 

[ooso] Resolution scalability can also be used in an 
alternative way. Since the scheduler 820 can specify the 
quality layers it wants when submits a quest, it can be 
easily programmed such that, for all viewpoints being 

20 visited for the first time, only the base layer data is 
downloaded. Then, whenever the viewpoint is revisited, 
more layers are downloaded. This configuration allows 
the coarse information about the scene to be downloaded 
at a fast speed, and provides a visual effect of 

2s progressive refinement as the viewer revolves the video. 
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This configuration is bandwidth -smart and also it fits 
the visual psychology: the more a user revisits a 
specific viewpoint (which could highly reflect his/her 
interest in that viewpoint) , the better the image quality 
s is for that viewpoint. 
VIEWPOINT SCALABILITY. 

mn For the user with slow Internet access, he/she 
can skip several viewpoints during the revolution. This 
is referred to as the fast revolution in subjective 
i- video. one extreme case is that only f ive PGs at five 
special viewpoints are downloaded for the first batch of 
data packs for transmission. with these PGs , the user 
. can at least navigate among the . five orthogonal 
viewpoints. Then, as the download process evolves, more 
is PGs in between the existing local PGs will be available, 
so that the operation of revolution will become smoother 
(Fig. 16) . 

TO Another possible realization of viewpoint 
scalability is to download only the C-image of each PG . 
» first. After all C-images of all PGs are completed, the 

S- images are then downloaded. 
LOCAL PLAYBACK COMPATIBILITY. 

[0083J Locally stored VAW files 200 may be replayed " 
from disk 810 . 
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STREAMING PANORAMIC CONTENTS. 

[0084] Fig. 17 shows that the described subjective 
■ video streaming methods and system are also applicable to 
streaming panoramic contents. 
1 s [0085] Panoramic image contents give viewer the visual 

experience that he/she is completely immersed in a visual 
atmosphere. Panoramic content is produced by collecting 
the pictures taken at a single viewpoint towards all 
possible directions. If there is no optical change in 
i? visual atmosphere during the time the pictures are taken, 
then the panoramic content forms a "spherical still 
.image*. Viewing this panoramic content corresponds to 
moving around a peeking window on the sphere. It can be 
readily understood that viewing a panoramic content is a 
special subjective video playing process, and that 
panoramic content is just the other extreme in contrast 
to mult i -viewpoint content. 

[0086] In observing this relationship, it is claimed 
here that the invented subjective video streaming methods 
20 and system can be directly applied to panoramic contents 
without substantial modification. The only major change 
to be done is to simply turn all lenses of the multi- 
viewpoint capturing device 810 1 from pointing inwards to 
outwards . 
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CONCLUSION. 

[0087] It will be apparent to those skilled in the art 
that various modifications can be made without departing 
from the scope or spirit of the invention, and it is 
5 intended that the present invention cover such 
modifications and variations in accordance with the scope 
of the appended claims and their equivalents. 
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THERE IS CLAIMED: 

1 l. A method of supporting subjective video at a server, 

2 comprising: 

* 3 receiving a request relating to subjective video 

4 content ; 

s accessing a view at will file corresponding to said 

6 subjective video content; 

7 in response to said request relating to said subjective 
s video content, providing initial image data relating 
9 to an origin processing group of said view at will 

10 file; 

n receiving a subsequent request relating to said 

12 subjective video content ; 

13 determining, from said subsequent request, a processing 

14 group identifier; and 

is based on said processing group identifier, providing 

is subsequent image data relating to a processing group 

i? identified by said processing group identifier; 

is wherein said initial image data and said subsequent 

19 image data comprise coded image data not derived from 

20 a three-dimensional model. 

1 2. The method of supporting subjective video at a 

2 server as set forth in claim 1, further comprising, after 

3 said accessing of said view at will file, obtaining from 
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4 said view at will file an offset table, wherein said ' • 
s offset table indicates a start of each set of image data 
e relating to each processing group in said view at will 
7 file. 


3. The method of supporting subjective video at a 
server as set forth in claim 2, wherein said view at will 
file comprises: 
a file header and processing group code streams ; 
said file header comprising said offset table; 
each of said processing group code streams comprising: 
a respective processing group header indicating a 
processing group, and identifier relating to a 
control camera in said processing group, and 
coding parameters; and 
a processing group data body, comprising: 

a code stream relating to an image provided by 

said control camera, defining a C-image; and 
code streams relating to images provided by each 
of a plurality of surrounding cameras in said 
processing group, defining S-images. 


1 3 
14 


16 


1 4. The method of supporting subjective video at a 

2 server as set forth in claim 3, wherein said code streams 

3 relating to said C-image and said S-images further 

4 comprise a base layer and a set of enhancement layers, 
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5 said base layer containing information of said image data 

e at a coarse level, and said enhancement layers containing 

7 information at finer levels of resolution. 

i i 5. A method of supporting subjective video at a client, 

2 comprising: 

3 initiating a streaming process by sending a request 

4 relating to subjective video content; 

5 receiving initial image data relating to an origin 

6 processing group of said view at will file; 

7 sending a subsequent request relating to a different 
s processing group with respect to said subjective 
9 video content ; 

10 receiving subsequent image data relating to said 

11 different processing group; 

12 wherein said initial image data and said subsequent 

13 image data comprise coded image data not derived from 

14 a three-dimensional model. 

1 6. The method of supporting subjective video at said 

2 client as set forth in claim 5, further comprising: 

3 providing said client with a streaming client and a 

4 viewer, said streaming client including a streaming 
s scheduler, said viewer including a viewer controller, 
e a display buffer, an end-user interface, a cache, and 
7 an image decoder ; 
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<% 

a providing said client with a viewpoint map, shared by 
9 said streaming client and said viewer; 

■lo receiving, in accordance with said initial image data, 
session description information; and 
initializing said viewpoint map based on said session 

13 description information; 

14 wherein: 

is said sending of said initial request activates said 

16 streaming scheduler; 

17 said sending of said subsequent request is performed 
by said streaming scheduler; 

said streaming scheduler identifies a selected 

processing group identifier based on user input; 
said streaming scheduler updates said viewpoint map 
based on said received image data to indicate 
local availability with respect to image data on a 

24 processing group basis; 

25 under control of said viewer controller: 

26 said cache receives said image data in a 

27 compressed form; 
said image decoder decodes said image data in 

said compressed form to provide decoded image 
data; and 


18 


20 


21 


22 


23 


28 


29 
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31 


said 


end-user 


interface 


receives 


said 


coded 


32 


image data from said display buffer for 


display. 


1 7. The method of supporting subjective video at said 

2 client as set forth in claim 6, wherein said viewer 

3 further comprises a geometric functions module for 

4 supporting user manipulation operations. 

1 8. The method of supporting subjective video at said 

2 client as set forth in claim 1, wherein said user 

3 manipulation operations include zoom, rotation, and 

4 revolution. 

1 9. The method of supporting subjective video at said 

2 client as set forth in claim 8, wherein said rotation is 

3 performed as a solely local function, using a two- 

4 dimensional image plane, at said client without support 

5 from a server. 

1 10. The method of supporting subjective video at said 

2 client as set forth in claim 8, wherein said zoom is 

3 performed as a function using support from said client 

4 and a remote server using resolution re- scaling 
s operations . 
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1 11 . The method of supporting subjective video at said 

2 client as set forth in claim 5, wherein said steps of 

3 sending said subsequent request and receiving said 

4 subsequent image data are performed in a synchronous 
s manner . 

1 12 . The method of supporting subjective video at said 

2 client as set forth in claim 5, wherein said steps of 

3 sending said subsequent request and receiving said 

4 subsequent image data are performed in an asynchronous 
s manner, 

1 13. The method of supporting subjective video at said 

2 client as set forth in claim 6, wherein said streaming 

3 scheduler streams image data according to a wave- front 

4 model . 

1 14. The method of supporting subjective video at said 

2 client as set forth in claim 13, wherein said wave-front 

3 model comprises : 

4 when a change of viewpoint is not indicated by a user, 
s said streaming scheduler requests image data relating 
e to processing groups in proximity to a present 
7 processing group, and 
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s when a change of viewpoint is indicated by said user, 

9 said streaming scheduler requests image data relating 

io to a processing group at said viewpoint and also 

n processing groups in proximity thereto. 

1 15. The method of supporting subjective video at said 

2 client as set forth in claim 13, wherein said wave- front 

3 model comprises arranging the order of image download 

4 based on the priority of a download task being inversely 

5 proportional to a distance between a current viewpoint 

6 and a viewpoint where said download task is defined. 

1 16. The method of supporting subjective video at said 

2 client as set forth in claim 6, wherein said streaming 

3 scheduler streams image data according to a resolution 

4 scalability scheduling policy. 

1 17. The method of supporting subjective video at said 

2 client as set forth in claim. 16, wherein said resolution 

3 scalability scheduling policy comprises: 

4 . determining a bandwidth of a local communication 
s connection; 

s requesting one or more enhancement layers based on said 

7 bandwidth determination. 
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1 18. The method of supporting subjective video at said 

2 client as set forth in claim 16, wherein said resolution 
a scalability scheduling policy comprises initially 
4 downloading only a base layer of said image data" relating 
s to a given viewpoint, monitoring user interaction to 
s determine whether said given viewpoint is revisited, and, 
7 when said monitoring indicates that said given viewpoint 

,e is revisited, downloading one or more enhancement layers. 

1 19. The method of supporting subjective video at said 

2 client as set forth in claim 8, wherein, in response to 

3 an indication of said revolution operation, said 

4 streaming scheduler streams image data by skipping 
s processing groups in accordance with an indicated speed 
6 of rotation. 

i 20. The method of supporting subjective video at said 

2- client as set forth in claim 6, further ■ comprising 

3 storing downloaded compressed image data locally and, in 

4 response to a request for re-displaying said locally 
s stored downloaded compressed image data, performing the 

6 steps of loading said locally stored downloaded 

7 compressed image data into said cache; decoding said 

8 locally stored downloaded compressed image data with said 

9 image decoder to provide said decoded image data; and 
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10 providing said decoded image data to said end-user 

11 interface via said display buffer for display. 


1 21. The method of supporting subjective video at said 

2 client as set forth in claim 5, wherein said image data 

3 is panoramic image data. 

1 22. The method of supporting subjective video at said 

2 client as set forth in claim 5, wherein said image data 

3 is multi-viewpoint image data. 

1 23. The method of supporting subjective video at said 

2 client as set forth in claim 5, wherein said viewer and 

3 said streaming client are implemented as plug-ins to a 

4 browser. 

1 24. An interactive multi-viewpoint subjective video 

2 streaming system, comprising a client and a passive 

3 streaming server, said client providing to said server 

4 selection commands selecting from a plurality of 
s viewpoints relating to a given scene, said server 

6 responding to said commands of said client by providing 

7 to said client corresponding image data for said selected 

8 one of said plurality of viewpoints. 
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